AITopics | evaluation partner

Collaborating Authors

evaluation partner

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ZSC-Eval: An Evaluation Toolkit and Benchmark for Multi-agent Zero-shot Coordination

Neural Information Processing SystemsMar-20-2026, 15:31:49 GMT

artificial intelligence, machine learning, reinforcement learning, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.59)

Add feedback

ZSC-Eval: An Evaluation Toolkit and Benchmark for Multi-agent Zero-shot Coordination Xihuai Wang

Neural Information Processing SystemsFeb-13-2026, 12:55:22 GMT

The significant difference between the deployment-time partners' distribution and the training partners' distribution determined by the

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Europe > Portugal (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

Add feedback

ZSC-Eval: An Evaluation Toolkit and Benchmark for Multi-agent Zero-shot Coordination Xihuai Wang

Neural Information Processing SystemsOct-10-2025, 02:49:06 GMT

The significant difference between the deployment-time partners' distribution and the training partners' distribution determined by the

agent, algorithm, evaluation partner, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Europe > Portugal (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

Add feedback

ZSC-Eval: An Evaluation Toolkit and Benchmark for Multi-agent Zero-shot Coordination

Neural Information Processing SystemsMay-27-2025, 01:29:24 GMT

Zero-shot coordination (ZSC) is a new cooperative multi-agent reinforcement learning (MARL) challenge that aims to train an ego agent to work with diverse, unseen partners during deployment. The significant difference between the deployment-time partners' distribution and the training partners' distribution determined by the training algorithm makes ZSC a unique out-of-distribution (OOD) generalization challenge. The potential distribution gap between evaluation and deployment-time partners leads to inadequate evaluation, which is exacerbated by the lack of appropriate evaluation metrics. ZSC-Eval consists of: 1) Generation of evaluation partner candidates through behavior-preferring rewards to approximate deployment-time partners' distribution; 2) Selection of evaluation partners by Best-Response Diversity (BR-Div); 3) Measurement of generalization performance with various evaluation partners via the Best-Response Proximity (BR-Prox) metric. We use ZSC-Eval to benchmark ZSC algorithms in Overcooked and Google Research Football environments and get novel empirical findings.

large language model, machine learning, reinforcement learning, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.98)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.62)

Add feedback

Quantifying Zero-shot Coordination Capability with Behavior Preferring Partners

Wang, Xihuai, Zhang, Shao, Zhang, Wenhao, Dong, Wentao, Chen, Jingxiao, Wen, Ying, Zhang, Weinan

arXiv.org Artificial IntelligenceOct-8-2023

Zero-shot coordination (ZSC) is a new challenge focusing on generalizing learned coordination skills to unseen partners. Existing methods train the ego agent with partners from pre-trained or evolving populations. The agent's ZSC capability is typically evaluated with a few evaluation partners, including humans and agents, and reported by mean returns. Current evaluation methods for ZSC capability still need improvement in constructing diverse evaluation partners and comprehensively measuring ZSC capability. In this paper, we aim to create a reliable, comprehensive, and efficient evaluation method for ZSC capability. We formally define the ideal'diversity-complete' evaluation partners and propose the best response (BR) diversity, which is the population diversity of the BRs to the partners, to approximate the ideal evaluation partners. We propose an evaluation workflow including'diversity-complete' evaluation partners construction and a multidimensional metric, the Best Response Proximity (BR-Prox) metric. We re-evaluate strong ZSC methods in the Overcooked environment using the proposed evaluation workflow. Surprisingly, the results in some of the most used layouts fail to distinguish the performance of different ZSC methods. Moreover, the evaluated ZSC methods lack the ability to produce enough diverse and high-performing training partners. Our proposed evaluation workflow calls for a change in how we efficiently evaluate ZSC methods as a supplement to human evaluation. Zero-shot Coordination (ZSC) is a new challenge in training an agent named ego agent to have the capability to coordinate with unseen partners in cooperative AI (Hu et al., 2020).

agent, diversity, evaluation partner, (12 more...)

arXiv.org Artificial Intelligence

2310.05208

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre:

Workflow (0.75)
Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.81)

Add feedback